feat(router): add Phase 2 semantic embeddings via @xenova/transformers#6
feat(router): add Phase 2 semantic embeddings via @xenova/transformers#6JustAGhosT merged 2 commits intomainfrom
Conversation
- embed.ts: singleton pipeline wrapper (Xenova/all-MiniLM-L6-v2) - buildSemanticIndex(): generates normalised embeddings per team entry - cosineSimilarity() + semanticSearch(): vector-similarity search with Phase 1 keyword fallback - routeSemantic(): async router entry point, skips embedFn for off-topic queries - data/embeddings.json: pre-built index for 13 standard Retort teams - embeddings.yml: weekly scheduled CI regeneration (Sunday 03:00 UTC) - 27 tests passing (14 Phase 1 + 13 new Phase 2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Mention Blocks like a regular teammate with your question or request: @blocks review this pull request Run |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds semantic routing to the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant RouteSemantic as "routeSemantic()"
participant OffTopic as "isOffTopic()"
participant SemanticSearch as "semanticSearch()"
participant EmbedFn as "embedFn([query])"
participant Cosine as "cosineSimilarity()"
Client->>RouteSemantic: routeSemantic(query, index, embedFn)
RouteSemantic->>OffTopic: isOffTopic(query)
alt off-topic
OffTopic-->>RouteSemantic: true
RouteSemantic-->>Client: AskResponse (scopeViolation, results=[])
else in-scope
OffTopic-->>RouteSemantic: false
RouteSemantic->>SemanticSearch: semanticSearch(query, entries, embedFn)
SemanticSearch->>EmbedFn: embedFn([query])
EmbedFn-->>SemanticSearch: query_embedding
loop for each entry with embedding
SemanticSearch->>Cosine: cosineSimilarity(query_emb, entry_emb)
Cosine-->>SemanticSearch: similarity_score
end
SemanticSearch-->>RouteSemantic: RouteResult[] (sorted)
RouteSemantic-->>Client: AskResponse (results, indexedAt)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c8aa792342
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "optionalDependencies": { | ||
| "@xenova/transformers": "^2.17.2" |
There was a problem hiding this comment.
Regenerate package-lock after adding transformers dependency
Adding @xenova/transformers in the workspace package without updating the root package-lock.json breaks every workflow step that uses npm ci. In this repo, npm ci now exits with EUSAGE (Missing: @xenova/transformers@ from lock file), so CI and the new embeddings workflow fail before build/test can run.
Useful? React with 👍 / 👎.
| const entriesWithEmbeddings = entries.filter((e) => e.embedding !== undefined) | ||
|
|
||
| // Fall back to keyword search when no embeddings are available. | ||
| if (entriesWithEmbeddings.length === 0) { | ||
| return search(query, entries) |
There was a problem hiding this comment.
Include non-embedded entries in semantic fallback results
This path only falls back to keyword search when no entries have embeddings; if the index is partially populated (for example, stale/generated data missing a newly added team), entries without embedding are silently dropped from all semantic queries. That makes valid teams unroutable until embeddings are fully regenerated, even though keyword data is available.
Useful? React with 👍 / 👎.
| for (let i = 0; i < a.length; i++) { | ||
| dot += a[i] * b[i] | ||
| normA += a[i] * a[i] | ||
| normB += b[i] * b[i] |
There was a problem hiding this comment.
Guard cosine similarity against vector length mismatch
The loop assumes equal-length vectors and multiplies b[i] for every i in a; when callers provide an embedFn that returns a different dimensionality than stored entry embeddings, b[i] becomes undefined, producing NaN confidences and unstable ranking. This should validate dimensions (or fail fast) instead of returning corrupted scores.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
packages/router/src/index-builder.ts (1)
59-68: Extract duplicate text-building logic and validate embedding count.The text representation logic (lines 59-61) duplicates the pattern from
buildIndex(lines 17-22). Additionally, the code assumesembeddings.length === teams.lengthwithout validation.♻️ Proposed refactor to reduce duplication and add safety check
+/** Build embedding text for a team (shared between tokenisation and embedding). */ +function buildTeamText(team: TeamLike): string { + return [team.name, team.focus, ...team.scope, ...team.accepts].join(' ') +} + export function buildIndex(teams: TeamLike[]): RouterIndex { const entries: RouterEntry[] = teams.map((team) => { - const raw = [ - team.name, - team.focus, - ...team.scope, - ...team.accepts, - ].join(' ') + const raw = buildTeamText(team) return { ... } }) ... } export async function buildSemanticIndex(...) { ... // Compute text representations for embedding. - const texts = teams.map((team) => - [team.name, team.focus, ...team.scope, ...team.accepts].join(' '), - ) + const texts = teams.map(buildTeamText) const embeddings = await doEmbed(texts) + if (embeddings.length !== base.entries.length) { + throw new Error(`Embedding count mismatch: expected ${base.entries.length}, got ${embeddings.length}`) + } + const entries: RouterEntry[] = base.entries.map((entry, i) => ({ ... })) ... }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/router/src/index-builder.ts` around lines 59 - 68, The text-building logic for teams is duplicated; extract it into a single helper (e.g., buildTeamTexts or reuse the existing buildIndex helper) so both places construct texts the same way (using team.name, team.focus, ...team.scope, ...team.accepts), then call doEmbed(texts) and immediately validate that embeddings.length === teams.length (or embeddings.length === base.entries.length) before mapping into RouterEntry; if the counts differ, throw or log a clear error and abort to avoid misaligned embeddings when creating entries (refer to symbols: texts, doEmbed, teams, base.entries, RouterEntry, buildIndex).packages/router/scripts/generate-embeddings.mjs (1)
122-130: Log the actual import error for easier debugging.The catch block swallows the error details. If the import fails for reasons other than a missing build (e.g., syntax error in the built code), the generic message won't help diagnose the issue.
♻️ Proposed fix to include error details
try { const mod = await import('../dist/index.js') buildSemanticIndex = mod.buildSemanticIndex - } catch { + } catch (err) { console.error( 'ERROR: Could not import dist/index.js. Run `npm run build -w `@retort-plugins/router`` first.', ) + console.error('Underlying error:', err) process.exit(1) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/router/scripts/generate-embeddings.mjs` around lines 122 - 130, The catch block that imports '../dist/index.js' currently swallows the thrown error; update the try/catch to capture the error (e.g., catch (err)) and include the error details in the console.error message when failing to import so you log both the existing guidance and the actual error (while still calling process.exit(1)); locate the import of '../dist/index.js' and the assignment to buildSemanticIndex to make this change..github/workflows/embeddings.yml (1)
26-32: Consider adding branch protection handling and explicit permissions.The workflow pushes directly to the default branch, which may fail if branch protection rules require PRs. Additionally, the workflow relies on default token permissions.
- If branch protection is enabled, the push will fail silently (the job won't fail due to the conditional commit).
- Consider creating a PR instead of direct push, or document that branch protection must allow bot pushes.
- Add explicit
permissionsblock for clarity.♻️ Suggested improvement with explicit permissions
jobs: regenerate: runs-on: ubuntu-latest + permissions: + contents: write steps:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/embeddings.yml around lines 26 - 32, Update the "Commit if changed" step to (1) declare explicit workflow permissions (e.g., set permissions.contents: write or appropriate checks for GITHUB_TOKEN) so the intent is clear, (2) stop relying on a silent git push and either create a PR via the GitHub CLI (gh) when changes are staged or explicitly fail the job when git push is rejected by branch protection, and (3) reference the existing step name "Commit if changed" and the commands git add / git commit / git push so you modify that step to create a PR or surface push failures rather than quietly doing nothing under branch protection rules.packages/router/src/embed.ts (1)
42-47: Batch process embeddings for better efficiency.The current implementation calls the pipeline once for each text, making N async calls to embed N texts. The
@xenova/transformerspipeline supports batch processing—you can pass all texts in a single call and get all embeddings back in one result. Withpooling: 'mean', the output is a flat Float32Array of shape[N × embedding_dim], which needs to be sliced per text.♻️ Proposed batch processing refactor
export async function embed(texts: string[]): Promise<number[][]> { const extractor = await getPipeline() - - const results: number[][] = [] - for (const text of texts) { - const output = await extractor([text], { pooling: 'mean', normalize: true }) - results.push(Array.from(output.data)) - } - return results + + if (texts.length === 0) return [] + + // Process all texts in a single batch + const output = await extractor(texts, { pooling: 'mean', normalize: true }) + + // Output.data is a flat Float32Array of shape [batch_size, embedding_dim] + const embeddingDim = 384 // all-MiniLM-L6-v2 produces 384-dim embeddings + const results: number[][] = [] + for (let i = 0; i < texts.length; i++) { + const start = i * embeddingDim + results.push(Array.from(output.data.slice(start, start + embeddingDim))) + } + return results }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/router/src/embed.ts` around lines 42 - 47, The loop makes N calls to extractor; instead call extractor once with the full texts array (await extractor(texts, { pooling: 'mean', normalize: true })), compute embedding_dim as output.data.length / texts.length, then slice the flat output.data into per-text segments and push Array.from(...) of each segment into results so results becomes an array of number[] for each input text; use the existing symbols extractor, texts, results and ensure the single await handles the batch call.packages/router/src/router.test.ts (1)
166-231: Add regression tests for invalid embedding shapes.Current semantic tests miss two edge cases: dimension mismatch and empty
embedFnoutput. Adding these will lock in behavior and prevent runtime regressions.Suggested test additions
describe('semanticSearch', () => { + it('handles mismatched embedding dimensions deterministically', async () => { + const badEntries = [ + { ...mockEntries[0], embedding: [1, 0] }, // shorter vector + { ...mockEntries[1], embedding: [1, 0, 0] }, + ] + await expect( + semanticSearch('api backend', badEntries, mockEmbedFn), + ).rejects.toThrow() + }) + + it('falls back safely when embedFn returns no query embedding', async () => { + const fn = vi.fn(async () => []) + await expect( + semanticSearch('api backend', mockEntries, fn), + ).resolves.toBeDefined() + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/router/src/router.test.ts` around lines 166 - 231, Add two regression tests inside the existing describe('semanticSearch') block to cover invalid embedding shapes: (1) a test where embedFn returns vectors with the wrong dimensionality (e.g., embedFn returns [1,2] while mockEntries have 3-length embeddings) and assert semanticSearch throws an error (or rejects) indicating a dimension mismatch; (2) a test where embedFn resolves to an empty array ([]) and assert semanticSearch throws or rejects for empty embed output. Reference the semanticSearch function and the embedFn/mockEmbedFn mocks in the tests so the intent is clear and future regressions are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/router/src/search.ts`:
- Around line 47-60: The cosineSimilarity function assumes input vectors a and b
are the same length which can lead to NaN when b[i] is undefined; update
cosineSimilarity to guard against mismatched lengths by either validating that
a.length === b.length and throwing or returning 0, or by iterating only to
Math.min(a.length, b.length) and treating missing components as zero, and ensure
the denom check still returns 0 when norms are zero to avoid NaN; reference the
cosineSimilarity function and variables a, b, dot, normA, normB, and denom when
making the change.
- Around line 85-90: The code destructures the first embedding from embedFn
without validating its result, which can cause a runtime error if embedFn
returns an empty or malformed array; after calling embedFn([query]) (where
queryEmbedding is currently extracted), validate that the returned array has at
least one element and that the element is a number[] of the expected length (or
shape), and if not either throw a clear, descriptive error or return a safe
fallback (e.g., empty results) before mapping entriesWithEmbeddings and calling
cosineSimilarity; update error messages to include context (query and embedFn
result) and ensure downstream code only runs when queryEmbedding passes the
checks.
---
Nitpick comments:
In @.github/workflows/embeddings.yml:
- Around line 26-32: Update the "Commit if changed" step to (1) declare explicit
workflow permissions (e.g., set permissions.contents: write or appropriate
checks for GITHUB_TOKEN) so the intent is clear, (2) stop relying on a silent
git push and either create a PR via the GitHub CLI (gh) when changes are staged
or explicitly fail the job when git push is rejected by branch protection, and
(3) reference the existing step name "Commit if changed" and the commands git
add / git commit / git push so you modify that step to create a PR or surface
push failures rather than quietly doing nothing under branch protection rules.
In `@packages/router/scripts/generate-embeddings.mjs`:
- Around line 122-130: The catch block that imports '../dist/index.js' currently
swallows the thrown error; update the try/catch to capture the error (e.g.,
catch (err)) and include the error details in the console.error message when
failing to import so you log both the existing guidance and the actual error
(while still calling process.exit(1)); locate the import of '../dist/index.js'
and the assignment to buildSemanticIndex to make this change.
In `@packages/router/src/embed.ts`:
- Around line 42-47: The loop makes N calls to extractor; instead call extractor
once with the full texts array (await extractor(texts, { pooling: 'mean',
normalize: true })), compute embedding_dim as output.data.length / texts.length,
then slice the flat output.data into per-text segments and push Array.from(...)
of each segment into results so results becomes an array of number[] for each
input text; use the existing symbols extractor, texts, results and ensure the
single await handles the batch call.
In `@packages/router/src/index-builder.ts`:
- Around line 59-68: The text-building logic for teams is duplicated; extract it
into a single helper (e.g., buildTeamTexts or reuse the existing buildIndex
helper) so both places construct texts the same way (using team.name,
team.focus, ...team.scope, ...team.accepts), then call doEmbed(texts) and
immediately validate that embeddings.length === teams.length (or
embeddings.length === base.entries.length) before mapping into RouterEntry; if
the counts differ, throw or log a clear error and abort to avoid misaligned
embeddings when creating entries (refer to symbols: texts, doEmbed, teams,
base.entries, RouterEntry, buildIndex).
In `@packages/router/src/router.test.ts`:
- Around line 166-231: Add two regression tests inside the existing
describe('semanticSearch') block to cover invalid embedding shapes: (1) a test
where embedFn returns vectors with the wrong dimensionality (e.g., embedFn
returns [1,2] while mockEntries have 3-length embeddings) and assert
semanticSearch throws an error (or rejects) indicating a dimension mismatch; (2)
a test where embedFn resolves to an empty array ([]) and assert semanticSearch
throws or rejects for empty embed output. Reference the semanticSearch function
and the embedFn/mockEmbedFn mocks in the tests so the intent is clear and future
regressions are caught.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24ce9a41-00b2-4ec6-bae6-c0a939607ede
📒 Files selected for processing (11)
.github/workflows/embeddings.ymlpackages/router/data/embeddings.jsonpackages/router/package.jsonpackages/router/scripts/generate-embeddings.mjspackages/router/src/embed.tspackages/router/src/index-builder.tspackages/router/src/index.tspackages/router/src/router.test.tspackages/router/src/router.tspackages/router/src/search.tspackages/router/src/types.ts
| export function cosineSimilarity(a: number[], b: number[]): number { | ||
| let dot = 0 | ||
| let normA = 0 | ||
| let normB = 0 | ||
|
|
||
| for (let i = 0; i < a.length; i++) { | ||
| dot += a[i] * b[i] | ||
| normA += a[i] * a[i] | ||
| normB += b[i] * b[i] | ||
| } | ||
|
|
||
| const denom = Math.sqrt(normA) * Math.sqrt(normB) | ||
| return denom === 0 ? 0 : dot / denom | ||
| } |
There was a problem hiding this comment.
Add a vector length guard to prevent NaN similarities.
Line 52 assumes a and b are equal length. If they diverge, b[i] becomes undefined, and confidence can become NaN, which breaks ranking determinism.
Proposed fix
export function cosineSimilarity(a: number[], b: number[]): number {
+ if (a.length !== b.length) {
+ throw new Error('Embedding vectors must have the same length')
+ }
+
let dot = 0
let normA = 0
let normB = 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/router/src/search.ts` around lines 47 - 60, The cosineSimilarity
function assumes input vectors a and b are the same length which can lead to NaN
when b[i] is undefined; update cosineSimilarity to guard against mismatched
lengths by either validating that a.length === b.length and throwing or
returning 0, or by iterating only to Math.min(a.length, b.length) and treating
missing components as zero, and ensure the denom check still returns 0 when
norms are zero to avoid NaN; reference the cosineSimilarity function and
variables a, b, dot, normA, normB, and denom when making the change.
| const [queryEmbedding] = await embedFn([query]) | ||
|
|
||
| return entriesWithEmbeddings | ||
| .map((entry) => ({ | ||
| entry, | ||
| confidence: cosineSimilarity(queryEmbedding, entry.embedding as number[]), |
There was a problem hiding this comment.
Validate embedFn output before scoring.
Line 85 destructures the first embedding without checking shape/length. If embedFn returns [] (or malformed output), Line 90 can throw at runtime.
Proposed fix
- const [queryEmbedding] = await embedFn([query])
+ const [queryEmbedding] = await embedFn([query])
+ if (!queryEmbedding || queryEmbedding.length === 0) {
+ return search(query, entries)
+ }
return entriesWithEmbeddings
+ .filter((entry) => Array.isArray(entry.embedding) && entry.embedding.length === queryEmbedding.length)
.map((entry) => ({
entry,
confidence: cosineSimilarity(queryEmbedding, entry.embedding as number[]),
}))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/router/src/search.ts` around lines 85 - 90, The code destructures
the first embedding from embedFn without validating its result, which can cause
a runtime error if embedFn returns an empty or malformed array; after calling
embedFn([query]) (where queryEmbedding is currently extracted), validate that
the returned array has at least one element and that the element is a number[]
of the expected length (or shape), and if not either throw a clear, descriptive
error or return a safe fallback (e.g., empty results) before mapping
entriesWithEmbeddings and calling cosineSimilarity; update error messages to
include context (query and embedFn result) and ensure downstream code only runs
when queryEmbedding passes the checks.
…en CI workflow - npm install to add @xenova/transformers entries to package-lock.json (fixes CI) - embed.ts: process whole batch in single pipeline call (N serial calls → 1); add empty-array guard; extract EMBEDDING_DIM = 384 constant - generate-embeddings.mjs: log underlying error in catch block for easier debug - embeddings.yml: add explicit permissions (contents: write, pull-requests: write); open a PR via gh when direct push is rejected by branch protection instead of silently doing nothing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
embed.ts— singleton@xenova/transformerspipeline (Xenova/all-MiniLM-L6-v2), model cached on first usebuildSemanticIndex()— generates normalised embeddings per team, stored asembedding?: number[]onRouterEntrycosineSimilarity()+semanticSearch()— vector-similarity search; falls back to Phase 1 keyword search if entries have no embeddingsrouteSemantic()— async entry point; skipsembedFncall entirely for off-topic queries (scope guard runs first)data/embeddings.json— pre-built index for 13 standard Retort teams.github/workflows/embeddings.yml— weekly scheduled regeneration (Sunday 03:00 UTC) +workflow_dispatchTest plan
npm run build -w @retort-plugins/router— cleannpm run typecheck --workspaces --if-present— zero errorsdata/embeddings.jsongenerated with real embeddings for all 13 teamsNotes
@xenova/transformersis anoptionalDependency— Phase 1 keyword routing works without itrouteSemantic()with anembedFn; existingroute()is unchanged🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Chores